- Title
- Synthetic Data as a Strategy to Resolve Data Privacy and Confidentiality Concerns in the Sport Sciences: Practical Examples and an R Shiny Application
- Creator
- Naughton, Mitchell; Weaving, Dan; Scott, Tannath; Compton, Heidi
- Relation
- International Journal of Sports Physiology and Performance Vol. 18, Issue 10, p. 1213-1218
- Publisher Link
- http://dx.doi.org/10.1123/ijspp.2023-0007
- Publisher
- Human Kinetics
- Resource Type
- journal article
- Date
- 2023
- Description
- Purpose: There has been a proliferation in technologies in the sport performance environment that collect increasingly larger quantities of athlete data. These data have the potential to be personal, sensitive, and revealing and raise privacy and confidentiality concerns. A solution may be the use of synthetic data, which mimic the properties of the original data. The aim of this study was to provide examples of synthetic data generation to demonstrate its practical use and to deploy a freely available web-based R Shiny application to generate synthetic data. Methods: Openly available data from 2 previously published studies were obtained, representing typical data sets of (1) field- and gym-based team-sport external and internal load during a preseason period (n = 28) and (2) performance and subjective changes from before to after the posttraining intervention (n = 22). Synthetic data were generated using the synthpop package in R Studio software, and comparisons between the original and synthetic data sets were made through Welch t tests and the distributional similarity standardized propensity mean squared error statistic. Results: There were no significant differences between the original and more synthetic data sets across all variables examined in both data sets (P > .05). Further, there was distributional similarity (ie, low standardized propensity mean squared error) between the original observed and synthetic data sets. Conclusions: These findings highlight the potential use of synthetic data as a practical solution to privacy and confidentiality issues. Synthetic data can unlock previously inaccessible data sets for exploratory analysis and facilitate multiteam or multicenter collaborations. Interested sport scientists, practitioners, and researchers should consider utilizing the shiny web application (SYNTHETIC DATA—available at https://assetlab.shinyapps.io/SyntheticData/).
- Subject
- sport performance; technology; hypothesis generation; data analysis; simulation
- Identifier
- http://hdl.handle.net/1959.13/1487418
- Identifier
- uon:52138
- Identifier
- ISSN:1555-0273
- Rights
- © 2023 The Authors. Published by Human Kinetics, Inc. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License, CC BY-NC 4.0, which permits the copy and redistribution in any medium or format, provided it is not used for commercial purposes, the original work is properly cited, the new use includes a link to the license, and any changes are indicated. See http://creativecommons.org/licenses/by-nc/4.0. This license does not cover any third-party material that may appear with permission in the article. For commercial use, permission should be requested from Human Kinetics, Inc., through the Copyright Clearance Center (http://www.copyright.com).
- Language
- eng
- Full Text
- Reviewed
- Hits: 1427
- Visitors: 1454
- Downloads: 37
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT02 | Publisher version (open access) | 1 MB | Adobe Acrobat PDF | View Details Download |